Goto

Collaborating Authors

 academic network




Academic Network Representation via Prediction-Sampling Incorporated Tensor Factorization

Zhang, Chunyang, Liao, Xin, Wu, Hao

arXiv.org Artificial Intelligence

Accurate representation to an academic network is of great significance to academic relationship mining like predicting scientific impact. A Latent Factorization of Tensors (LFT) model is one of the most effective models for learning the representation of a target network. However, an academic network is often High-Dimensional and Incomplete (HDI) because the relationships among numerous network entities are impossible to be fully explored, making it difficult for an LFT model to learn accurate representation of the academic network. To address this issue, this paper proposes a Prediction-sampling-based Latent Factorization of Tensors (PLFT) model with two ideas: 1) constructing a cascade LFT architecture to enhance model representation learning ability via learning academic network hierarchical features, and 2) introducing a nonlinear activation-incorporated predicting-sampling strategy to more accurately learn the network representation via generating new academic network data layer by layer. Experimental results from the three real-world academic network datasets show that the PLFT model outperforms existing models when predicting the unexplored relationships among network entities.


Learning Cross-Task Generalities Across Graphs via Task-trees

Wang, Zehong, Zhang, Zheyuan, Ma, Tianyi, Chawla, Nitesh V, Zhang, Chuxu, Ye, Yanfang

arXiv.org Artificial Intelligence

Foundation models aim to create general, cross-task, and cross-domain machine learning models by pretraining on large-scale datasets to capture shared patterns or concepts (generalities), such as contours, colors, textures, and edges in images, or tokens, words, and sentences in text. However, discovering generalities across graphs remains challenging, which has hindered the development of graph foundation models. To tackle this challenge, in this paper, we propose a novel approach to learn generalities across graphs via task-trees. Specifically, we first define the basic learning instances in graphs as task-trees and assume that the generalities shared across graphs are, at least partially, preserved in the task-trees of the given graphs. To validate the assumption, we first perform a theoretical analysis of task-trees in terms of stability, transferability, and generalization. We find that if a graph neural network (GNN) model is pretrained on diverse task-trees through a reconstruction task, it can learn sufficient transferable knowledge for downstream tasks using an appropriate set of fine-tuning samples. To empirically validate the assumption, we further instantiate the theorems by developing a cross-task, cross-domain graph foundation model named Graph generality Identifier on task-Trees (GIT). The extensive experiments over 30 graphs from five domains demonstrate the effectiveness of GIT in fine-tuning, in-context learning, and zero-shot learning scenarios. Particularly, the general GIT model pretrained on large-scale datasets can be quickly adapted to specific domains, matching or even surpassing expert models designed for those domains. Our data and code are available at https://github.com/Zehong-Wang/GIT.


Learning Multiplex Embeddings on Text-rich Networks with One Text Encoder

Jin, Bowen, Zhang, Wentao, Zhang, Yu, Meng, Yu, Zhao, Han, Han, Jiawei

arXiv.org Artificial Intelligence

In real-world scenarios, texts in a network are often linked by multiple semantic relations (e.g., papers in an academic network are referenced by other publications, written by the same author, or published in the same venue), where text documents and their relations form a multiplex text-rich network. Mainstream text representation learning methods use pretrained language models (PLMs) to generate one embedding for each text unit, expecting that all types of relations between texts can be captured by these single-view embeddings. However, this presumption does not hold particularly in multiplex text-rich networks. Along another line of work, multiplex graph neural networks (GNNs) directly initialize node attributes as a feature vector for node representation learning, but they cannot fully capture the semantics of the nodes' associated texts. To bridge these gaps, we propose METERN, a new framework for learning Multiplex Embeddings on TExt-Rich Networks. In contrast to existing methods, METERN uses one text encoder to model the shared knowledge across relations and leverages a small number of parameters per relation to derive relation-specific representations. This allows the encoder to effectively capture the multiplex structures in the network while also preserving parameter efficiency. We conduct experiments on nine downstream tasks in five networks from both academic and e-commerce domains, where METERN outperforms baselines significantly and consistently. The code is available at https://github.com/PeterGriffinJin/METERN-submit.


Modeling Dynamic Heterogeneous Graph and Node Importance for Future Citation Prediction

Geng, Hao, Wang, Deqing, Zhuang, Fuzhen, Ming, Xuehua, Du, Chenguang, Jiang, Ting, Guo, Haolong, Liu, Rui

arXiv.org Artificial Intelligence

Accurate citation count prediction of newly published papers could help editors and readers rapidly figure out the influential papers in the future. Though many approaches are proposed to predict a paper's future citation, most ignore the dynamic heterogeneous graph structure or node importance in academic networks. To cope with this problem, we propose a Dynamic heterogeneous Graph and Node Importance network (DGNI) learning framework, which fully leverages the dynamic heterogeneous graph and node importance information to predict future citation trends of newly published papers. First, a dynamic heterogeneous network embedding module is provided to capture the dynamic evolutionary trends of the whole academic network. Then, a node importance embedding module is proposed to capture the global consistency relationship to figure out each paper's node importance. Finally, the dynamic evolutionary trend embeddings and node importance embeddings calculated above are combined to jointly predict the future citation counts of each paper, by a log-normal distribution model according to multi-faced paper node representations. Extensive experiments on two large-scale datasets demonstrate that our model significantly improves all indicators compared to the SOTA models.


Heterogeneous Graph Learning for Explainable Recommendation over Academic Networks

Chen, Xiangtai, Tang, Tao, Ren, Jing, Lee, Ivan, Chen, Honglong, Xia, Feng

arXiv.org Artificial Intelligence

With the explosive growth of new graduates with research degrees every year, unprecedented challenges arise for early-career researchers to find a job at a suitable institution. This study aims to understand the behavior of academic job transition and hence recommend suitable institutions for PhD graduates. Specifically, we design a deep learning model to predict the career move of early-career researchers and provide suggestions. The design is built on top of scholarly/academic networks, which contains abundant information about scientific collaboration among scholars and institutions. We construct a heterogeneous scholarly network to facilitate the exploring of the behavior of career moves and the recommendation of institutions for scholars. We devise an unsupervised learning model called HAI (Heterogeneous graph Attention InfoMax) which aggregates attention mechanism and mutual information for institution recommendation. Moreover, we propose scholar attention and meta-path attention to discover the hidden relationships between several meta-paths. With these mechanisms, HAI provides ordered recommendations with explainability. We evaluate HAI upon a real-world dataset against baseline methods. Experimental results verify the effectiveness and efficiency of our approach.